Clustering

Introduction

In this homework, we are focusing on performing clustering analysis on a dataset that tracks changes in uninsured rates across different regions or groups from 2010 to 2015. The dataset features two main columns: ‘Uninsured Rate (2010)’ and ‘Uninsured Rate (2015)’, which represent the proportion of the uninsured population at these two different points in time. This data provides a valuable opportunity to observe shifts in insurance coverage over a five-year period.

The primary objective of our clustering analysis is to identify patterns and groupings within the data that can reveal underlying trends in insurance coverage changes. By applying k-means, DBSCAN, and Hierarchical clustering methods, we aim to uncover distinct clusters that can help us understand how and where significant shifts in uninsured rates have occurred. For instance, through this analysis, we might be able to pinpoint regions or groups that have seen substantial improvements in coverage, as well as those where coverage has perhaps declined or not improved significantly. This information is crucial for policy analysis, allowing us to target areas that may need more attention or to understand the impacts of healthcare policies over this period.

Theory

K-Means Clustering: K-Means clustering is a widely used technique that groups data into a predetermined number of clusters, much like sorting different fruits into set baskets based on their attributes. In this method, each data point is assigned to the closest of the ‘k’ cluster centers. These centers, initially selected at random, are adjusted iteratively to minimize the distance between them and their associated data points, akin to finding the optimal central location for each cluster. The process is repeated until these centers stabilize, signifying well-defined, compact clusters.

To determine the optimal number of clusters, the ‘Elbow Method’ is often employed. This involves applying K-Means with various ‘k’ values and plotting the total within-cluster variance against the number of clusters. The ‘elbow’ point in the graph, where the rate of variance decrease changes markedly, is considered an appropriate estimate for the number of clusters.

Hierarchical Clustering: Hierarchical clustering constructs a cluster hierarchy or tree. It can be envisioned as assembling a family tree, where each data point starts as an individual cluster, and pairs of clusters are progressively merged. There are two types: ‘Agglomerative’, which begins with each point as a separate cluster and combines them, and ‘Divisive’, which starts with a single cluster and divides it. The common approach is agglomerative, where the nearest clusters are merged at each step, creating a dendrogram that illustrates the clusters’ arrangement.

The number of clusters is determined by cutting the dendrogram at a level that segregates the data suitably. Identifying ‘large jumps’ in the dendrogram can indicate natural divisions between clusters.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN stands out from K-Means by not predetermining the number of clusters. Instead, it identifies clusters as areas of high density separated by areas of low density. Comparable to identifying a flock of birds where those flying closely are grouped together, and solitary ones are marked as outliers.

This method requires two parameters: ‘epsilon’, determining the proximity needed for points to be part of the same cluster, and ‘min_samples’, defining the minimum number of points required to form a dense region. DBSCAN excels in identifying clusters of irregular shapes and adapts to the data’s structure, unlike K-Means, which presumes spherical clusters.

These clustering methods are essential for uncovering patterns in data, aiding in the comprehension of complex datasets and facilitating informed decision-making based on these analyses.

Clustering Search Words

To effectively cluster textual data(in this case we are using record data but for future reference), several essential preprocessing steps are necessary to refine the data. This involves eliminating stop words, stripping out non-alphabetic characters including punctuation, and applying lemmatization where appropriate. These steps have been thoroughly covered in the Data Cleaning section for those interested in a detailed guide. Once the text data is preprocessed, it’s ready for visualization.

In an optimal scenario, clustering this text data would result in four clear categories: aca, medicine, and medicaid. The objective is to have each cluster distinctly associated with one of these categories. This classification will then provide deeper insights into the varied impacts of each of these medical impacts in U.S.

https://newsapi.org/v2/everything
topic =  ACA in ACA
TITLES= ['Dr. Y. S. Rajashekar Reddy ACA–VDCA Cricket Stadium', 'Assam Cricket Association Stadium, Guwahati', '2022 ACA Africa T20 Cup']
-------------------------
Dr. Y. S. Rajashekar Reddy ACA–VDCA Cricket Stadium
https://en.wikipedia.org/wiki/Dr._Y._S._Rajashekar_Reddy_ACA%E2%80%93VDCA_Cricket_Stadium
['Ground profile', 'History', 'Stats & records', 'List of centuries', 'Key', 'Tests', 'ODIs', 'List of five wicket hauls', 'Key', 'Tests', 'One Day Internationals', 'Notable events', 'Gallery', 'References', 'External links']
-------------------------
Assam Cricket Association Stadium, Guwahati
https://en.wikipedia.org/wiki/Assam_Cricket_Association_Stadium,_Guwahati
['History', 'List of centuries', 'Key', 'One day internationals', 'T20 internationals', 'See also', 'References', 'External links']
-------------------------
2022 ACA Africa T20 Cup
https://en.wikipedia.org/wiki/2022_ACA_Africa_T20_Cup
['North-Western qualifier', 'Points table', 'Fixtures', 'Semi-finals', 'Final', 'Southern qualifier', 'Points table', 'Fixtures', 'ACA Africa T20 Cup Finals', 'Squads', 'Group stage', 'Group A', 'Group B', 'Semi-finals', 'Final', 'References', 'External links']
topic =  Medicine Price in ACA
TITLES= ['Affordable Care Act', 'Healthcare in the United States', 'Health care prices in the United States']
-------------------------
Affordable Care Act
https://en.wikipedia.org/wiki/Affordable_Care_Act
['Provisions', 'Insurance regulations: individual policies', 'Individual mandate', 'Exchanges', 'Premium subsidies', 'Cost-sharing reduction subsidies', 'Risk management', 'Risk corridors', 'Reinsurance', 'Risk adjustment', 'Medicaid expansion', 'Medicare savings', 'Taxes', 'Medicare taxes', 'Excise taxes', 'SCHIP', 'Dependents', 'Employer mandate', 'Delivery system reforms', 'Hospital quality', 'Bundled payments', 'Accountable care organizations', 'Medicare drug benefit (Part D)', 'State waivers', 'Other insurance provisions', 'Nutrition labeling requirements', 'Legislative history', 'Individual mandate', 'Academic foundation', 'Healthcare debate, 2008–10', 'Senate', 'House', 'Post-enactment', 'Impact', 'Coverage', 'Taxes', 'Insurance exchanges', 'Medicaid expansion in practice', 'Medicaid expansion by state', 'Insurance costs', 'Deductibles and co-payments', 'Health outcomes', 'Distributional impact', 'Federal deficit', 'CBO estimates of revenue and impact on deficit', 'Opinions on CBO projections', 'Employer mandate and part-time work', 'Hospitals', 'Economic consequences', 'Public opinion', 'Political aspects', '"Obamacare"', 'Common misconceptions', '"Death panels"', 'Members of Congress', 'Illegal immigrants', 'Exchange "death spiral"', '"If you like your plan"', 'Criticism and opposition', 'Legal challenges', '<i>National Federation of Independent Business v. Sebelius</i>', 'Contraception mandate', '<i>King v Burwell</i>', '<i>House v. Price</i>', '<i>United States House of Representatives v. Azar</i>', '<i>California v. Texas</i>', 'Risk corridors', 'Non-cooperation', 'Repeal efforts', '2013 federal government shutdown', '2017 repeal effort', 'Actions to hinder implementation', 'Socialism debate', 'Implementation', 'In popular culture', 'See also', 'References', 'Further reading', 'Preliminary CBO documents', 'CMS Estimates of the impact of P.L. 111-148', 'CMS Estimates of the impact of H.R. 3590', 'Senate Finance Committee meetings', 'External links', 'ACA text']
-------------------------
Healthcare in the United States
https://en.wikipedia.org/wiki/Healthcare_in_the_United_States
['History', 'Statistics', 'Hospitalizations', 'Health insurance and accessibility', 'Health in the US in global context', 'Causes of mortality in the US', 'Providers', 'Facilities', 'Physicians (M.D. and D.O.)', 'Medical products, research, and development', 'Healthcare provider employment in the US', 'Alternative medicine', 'Spending', 'Regulation and oversight', 'Involved organizations and institutions', '"Certificates of need" for hospitals', 'Licensing of providers', 'Emergency Medical Treatment and Active Labor Act (EMTALA)', 'Quality assurance', 'Overall system effectiveness', 'Measures of effectiveness', 'Waiting times', 'Population health: quality, prevention, vulnerable populations', 'Innovation: workforce, healthcare IT, R&D', 'Compared to other countries', 'System efficiency and equity', 'Efficiency', 'Preventable deaths', 'Value for money', 'Delays in seeking care and increased use of emergency care', 'Variations in provider practices', 'Care coordination', 'Administrative costs', 'Long-term living facilities', 'Third-party payment problem and consumer-driven insurance', 'Equity', 'Mental health', 'Oral health', 'Medical underwriting and the uninsurable', 'Demographic differences', 'Prescription drug issues', 'Drug efficiency and safety', 'Prescription drug prices', 'Impact of drug companies', 'Healthcare reform debate', 'Patient Protection and Affordable Care Act (2010)', 'Health insurance coverage for immigrants', 'See also', 'References', 'Further reading', 'External links']
-------------------------
Health care prices in the United States
https://en.wikipedia.org/wiki/Health_care_prices_in_the_United_States
['Nature of the healthcare markets', 'Coverage', 'Price transparency issues', 'Government-mandated critical care', 'Healthcare is not a typical market', 'Medicare and Medicaid', 'Employer-based market', 'Affordable Care Act (ACA) marketplaces', 'Deductibles', 'Prescription drugs', 'Reasons for higher costs', 'Relative to other countries', 'Relative to prior years', 'See also', 'References', 'Further reading', 'External links']
topic =  Medicaid USA in ACA
TITLES= ['Medicaid', 'Affordable Care Act', 'Families USA']
-------------------------
Medicaid
https://en.wikipedia.org/wiki/Medicaid
['Features', 'History', 'Expansion under the Affordable Care Act', 'State implementations', 'Differences by state', 'Political influences', 'Eligibility and coverage', 'Reimbursement for care providers', 'Enrollment', 'Comparisons with Medicare', 'Benefits', 'Dental', 'Eligibility', 'PPACA income test standardization', 'Non-PPACA eligibility', 'Supplemental Security Income beneficiaries', 'Five year "look-back"', 'Immigration status', 'Children and SCHIP', 'HIV', 'Utilization', 'Budget and financing', 'Effects', 'Coverage gains', 'Mortality and disability reduction', 'Rural hospitals boosted revenue', 'Financial and health security increase', 'Political participation increase', 'Crime reduction', 'Oregon Medicaid health experiment and controversy', 'See also', 'References', 'Further reading', 'External links']
-------------------------
Affordable Care Act
https://en.wikipedia.org/wiki/Affordable_Care_Act
['Provisions', 'Insurance regulations: individual policies', 'Individual mandate', 'Exchanges', 'Premium subsidies', 'Cost-sharing reduction subsidies', 'Risk management', 'Risk corridors', 'Reinsurance', 'Risk adjustment', 'Medicaid expansion', 'Medicare savings', 'Taxes', 'Medicare taxes', 'Excise taxes', 'SCHIP', 'Dependents', 'Employer mandate', 'Delivery system reforms', 'Hospital quality', 'Bundled payments', 'Accountable care organizations', 'Medicare drug benefit (Part D)', 'State waivers', 'Other insurance provisions', 'Nutrition labeling requirements', 'Legislative history', 'Individual mandate', 'Academic foundation', 'Healthcare debate, 2008–10', 'Senate', 'House', 'Post-enactment', 'Impact', 'Coverage', 'Taxes', 'Insurance exchanges', 'Medicaid expansion in practice', 'Medicaid expansion by state', 'Insurance costs', 'Deductibles and co-payments', 'Health outcomes', 'Distributional impact', 'Federal deficit', 'CBO estimates of revenue and impact on deficit', 'Opinions on CBO projections', 'Employer mandate and part-time work', 'Hospitals', 'Economic consequences', 'Public opinion', 'Political aspects', '"Obamacare"', 'Common misconceptions', '"Death panels"', 'Members of Congress', 'Illegal immigrants', 'Exchange "death spiral"', '"If you like your plan"', 'Criticism and opposition', 'Legal challenges', '<i>National Federation of Independent Business v. Sebelius</i>', 'Contraception mandate', '<i>King v Burwell</i>', '<i>House v. Price</i>', '<i>United States House of Representatives v. Azar</i>', '<i>California v. Texas</i>', 'Risk corridors', 'Non-cooperation', 'Repeal efforts', '2013 federal government shutdown', '2017 repeal effort', 'Actions to hinder implementation', 'Socialism debate', 'Implementation', 'In popular culture', 'See also', 'References', 'Further reading', 'Preliminary CBO documents', 'CMS Estimates of the impact of P.L. 111-148', 'CMS Estimates of the impact of H.R. 3590', 'Senate Finance Committee meetings', 'External links', 'ACA text']
-------------------------
Families USA
https://en.wikipedia.org/wiki/Families_USA
['History', 'Background', 'See also', 'References', 'External links']
topic =  ACA in Medicine
TITLES= ['Dr. Y. S. Rajashekar Reddy ACA–VDCA Cricket Stadium', 'Emergency medicine', 'Affordable Care Act']
-------------------------
Dr. Y. S. Rajashekar Reddy ACA–VDCA Cricket Stadium
https://en.wikipedia.org/wiki/Dr._Y._S._Rajashekar_Reddy_ACA%E2%80%93VDCA_Cricket_Stadium
['Ground profile', 'History', 'Stats & records', 'List of centuries', 'Key', 'Tests', 'ODIs', 'List of five wicket hauls', 'Key', 'Tests', 'One Day Internationals', 'Notable events', 'Gallery', 'References', 'External links']
-------------------------
Emergency medicine
https://en.wikipedia.org/wiki/Emergency_medicine
['Scope', 'Work patterns', 'History', 'Financing and practice organization', 'Reimbursement', 'Compensation', 'Payment systems', 'Overutilization', 'Uncompensated care', 'EMTALA', 'Care delivery in different ED settings', 'Rural', 'Urban', 'Patient–provider relationships', 'Medical error', 'Treatments', 'Training', 'Argentina', 'Australia and New Zealand', 'Belgium', 'Brazil', 'Chile', 'Canada', 'China', 'Germany', 'India', 'Malaysia', 'Saudi Arabia', 'Switzerland', 'United States', 'Funding for training', 'United Kingdom', 'Turkey', 'Pakistan', 'Iran', 'Ethical and medicolegal issues', 'See also', 'References', 'Further reading', 'External links']
-------------------------
Affordable Care Act
https://en.wikipedia.org/wiki/Affordable_Care_Act
['Provisions', 'Insurance regulations: individual policies', 'Individual mandate', 'Exchanges', 'Premium subsidies', 'Cost-sharing reduction subsidies', 'Risk management', 'Risk corridors', 'Reinsurance', 'Risk adjustment', 'Medicaid expansion', 'Medicare savings', 'Taxes', 'Medicare taxes', 'Excise taxes', 'SCHIP', 'Dependents', 'Employer mandate', 'Delivery system reforms', 'Hospital quality', 'Bundled payments', 'Accountable care organizations', 'Medicare drug benefit (Part D)', 'State waivers', 'Other insurance provisions', 'Nutrition labeling requirements', 'Legislative history', 'Individual mandate', 'Academic foundation', 'Healthcare debate, 2008–10', 'Senate', 'House', 'Post-enactment', 'Impact', 'Coverage', 'Taxes', 'Insurance exchanges', 'Medicaid expansion in practice', 'Medicaid expansion by state', 'Insurance costs', 'Deductibles and co-payments', 'Health outcomes', 'Distributional impact', 'Federal deficit', 'CBO estimates of revenue and impact on deficit', 'Opinions on CBO projections', 'Employer mandate and part-time work', 'Hospitals', 'Economic consequences', 'Public opinion', 'Political aspects', '"Obamacare"', 'Common misconceptions', '"Death panels"', 'Members of Congress', 'Illegal immigrants', 'Exchange "death spiral"', '"If you like your plan"', 'Criticism and opposition', 'Legal challenges', '<i>National Federation of Independent Business v. Sebelius</i>', 'Contraception mandate', '<i>King v Burwell</i>', '<i>House v. Price</i>', '<i>United States House of Representatives v. Azar</i>', '<i>California v. Texas</i>', 'Risk corridors', 'Non-cooperation', 'Repeal efforts', '2013 federal government shutdown', '2017 repeal effort', 'Actions to hinder implementation', 'Socialism debate', 'Implementation', 'In popular culture', 'See also', 'References', 'Further reading', 'Preliminary CBO documents', 'CMS Estimates of the impact of P.L. 111-148', 'CMS Estimates of the impact of H.R. 3590', 'Senate Finance Committee meetings', 'External links', 'ACA text']
topic =  Medicine Price in Medicine
TITLES= ['Medicine', 'Veterinary medicine', 'Emergency medicine']
-------------------------
Medicine
https://en.wikipedia.org/wiki/Medicine
['Etymology', 'Clinical practice', 'Institutions', 'Delivery', 'Branches', 'Basic sciences', 'Specialties', 'Surgical specialty', 'Internal medicine specialty', 'Diagnostic specialties', 'Other major specialties', 'Interdisciplinary fields', 'Education and legal controls', 'Medical ethics', 'History', 'Ancient world', 'Middle Ages', 'Modern', 'Quality, efficiency, and access', 'See also', 'References']
-------------------------
Veterinary medicine
https://en.wikipedia.org/wiki/Veterinary_medicine
['History', 'Premodern era', 'Establishment of profession', 'Veterinary workers', 'Veterinary physicians', 'Paraveterinary workers', 'Allied professions', 'Veterinary research', 'Clinical veterinary research', 'See also', 'By country', 'References', 'Further reading', 'Introductory textbooks and references', 'Monographs and other speciality texts', 'Veterinary nursing, ophthalmology, and pharmacology', 'Related fields', 'External links']
-------------------------
Emergency medicine
https://en.wikipedia.org/wiki/Emergency_medicine
['Scope', 'Work patterns', 'History', 'Financing and practice organization', 'Reimbursement', 'Compensation', 'Payment systems', 'Overutilization', 'Uncompensated care', 'EMTALA', 'Care delivery in different ED settings', 'Rural', 'Urban', 'Patient–provider relationships', 'Medical error', 'Treatments', 'Training', 'Argentina', 'Australia and New Zealand', 'Belgium', 'Brazil', 'Chile', 'Canada', 'China', 'Germany', 'India', 'Malaysia', 'Saudi Arabia', 'Switzerland', 'United States', 'Funding for training', 'United Kingdom', 'Turkey', 'Pakistan', 'Iran', 'Ethical and medicolegal issues', 'See also', 'References', 'Further reading', 'External links']
topic =  Medicaid USA in Medicine
TITLES= ['Medicaid', 'Affordable Care Act', 'Internal medicine']
-------------------------
Medicaid
https://en.wikipedia.org/wiki/Medicaid
['Features', 'History', 'Expansion under the Affordable Care Act', 'State implementations', 'Differences by state', 'Political influences', 'Eligibility and coverage', 'Reimbursement for care providers', 'Enrollment', 'Comparisons with Medicare', 'Benefits', 'Dental', 'Eligibility', 'PPACA income test standardization', 'Non-PPACA eligibility', 'Supplemental Security Income beneficiaries', 'Five year "look-back"', 'Immigration status', 'Children and SCHIP', 'HIV', 'Utilization', 'Budget and financing', 'Effects', 'Coverage gains', 'Mortality and disability reduction', 'Rural hospitals boosted revenue', 'Financial and health security increase', 'Political participation increase', 'Crime reduction', 'Oregon Medicaid health experiment and controversy', 'See also', 'References', 'Further reading', 'External links']
-------------------------
Affordable Care Act
https://en.wikipedia.org/wiki/Affordable_Care_Act
['Provisions', 'Insurance regulations: individual policies', 'Individual mandate', 'Exchanges', 'Premium subsidies', 'Cost-sharing reduction subsidies', 'Risk management', 'Risk corridors', 'Reinsurance', 'Risk adjustment', 'Medicaid expansion', 'Medicare savings', 'Taxes', 'Medicare taxes', 'Excise taxes', 'SCHIP', 'Dependents', 'Employer mandate', 'Delivery system reforms', 'Hospital quality', 'Bundled payments', 'Accountable care organizations', 'Medicare drug benefit (Part D)', 'State waivers', 'Other insurance provisions', 'Nutrition labeling requirements', 'Legislative history', 'Individual mandate', 'Academic foundation', 'Healthcare debate, 2008–10', 'Senate', 'House', 'Post-enactment', 'Impact', 'Coverage', 'Taxes', 'Insurance exchanges', 'Medicaid expansion in practice', 'Medicaid expansion by state', 'Insurance costs', 'Deductibles and co-payments', 'Health outcomes', 'Distributional impact', 'Federal deficit', 'CBO estimates of revenue and impact on deficit', 'Opinions on CBO projections', 'Employer mandate and part-time work', 'Hospitals', 'Economic consequences', 'Public opinion', 'Political aspects', '"Obamacare"', 'Common misconceptions', '"Death panels"', 'Members of Congress', 'Illegal immigrants', 'Exchange "death spiral"', '"If you like your plan"', 'Criticism and opposition', 'Legal challenges', '<i>National Federation of Independent Business v. Sebelius</i>', 'Contraception mandate', '<i>King v Burwell</i>', '<i>House v. Price</i>', '<i>United States House of Representatives v. Azar</i>', '<i>California v. Texas</i>', 'Risk corridors', 'Non-cooperation', 'Repeal efforts', '2013 federal government shutdown', '2017 repeal effort', 'Actions to hinder implementation', 'Socialism debate', 'Implementation', 'In popular culture', 'See also', 'References', 'Further reading', 'Preliminary CBO documents', 'CMS Estimates of the impact of P.L. 111-148', 'CMS Estimates of the impact of H.R. 3590', 'Senate Finance Committee meetings', 'External links', 'ACA text']
-------------------------
Internal medicine
https://en.wikipedia.org/wiki/Internal_medicine
['Etymology and historical development', 'Role of internal medicine specialists', 'Education and training', 'Certification', 'Subspecialties', 'United States of America', 'American Board of Internal Medicine', 'American College of Osteopathic Internists', 'United Kingdom', 'European Union', 'Australia', 'Canada', 'Medical diagnosis and treatment', 'Gathering data', 'Generating diagnostic hypotheses', 'Communication', 'Treatment', 'Prevention and other services', 'Ethics', 'Patient-physician relationship', 'Treatment and telemedicine', 'Financial issues and conflicts of interest', 'Other topics', 'See also', 'References', 'Further reading', 'External links']
topic =  ACA in Medicaid
TITLES= ['Medicaid', 'Affordable Care Act', 'Medicaid coverage gap']
-------------------------
Medicaid
https://en.wikipedia.org/wiki/Medicaid
['Features', 'History', 'Expansion under the Affordable Care Act', 'State implementations', 'Differences by state', 'Political influences', 'Eligibility and coverage', 'Reimbursement for care providers', 'Enrollment', 'Comparisons with Medicare', 'Benefits', 'Dental', 'Eligibility', 'PPACA income test standardization', 'Non-PPACA eligibility', 'Supplemental Security Income beneficiaries', 'Five year "look-back"', 'Immigration status', 'Children and SCHIP', 'HIV', 'Utilization', 'Budget and financing', 'Effects', 'Coverage gains', 'Mortality and disability reduction', 'Rural hospitals boosted revenue', 'Financial and health security increase', 'Political participation increase', 'Crime reduction', 'Oregon Medicaid health experiment and controversy', 'See also', 'References', 'Further reading', 'External links']
-------------------------
Affordable Care Act
https://en.wikipedia.org/wiki/Affordable_Care_Act
['Provisions', 'Insurance regulations: individual policies', 'Individual mandate', 'Exchanges', 'Premium subsidies', 'Cost-sharing reduction subsidies', 'Risk management', 'Risk corridors', 'Reinsurance', 'Risk adjustment', 'Medicaid expansion', 'Medicare savings', 'Taxes', 'Medicare taxes', 'Excise taxes', 'SCHIP', 'Dependents', 'Employer mandate', 'Delivery system reforms', 'Hospital quality', 'Bundled payments', 'Accountable care organizations', 'Medicare drug benefit (Part D)', 'State waivers', 'Other insurance provisions', 'Nutrition labeling requirements', 'Legislative history', 'Individual mandate', 'Academic foundation', 'Healthcare debate, 2008–10', 'Senate', 'House', 'Post-enactment', 'Impact', 'Coverage', 'Taxes', 'Insurance exchanges', 'Medicaid expansion in practice', 'Medicaid expansion by state', 'Insurance costs', 'Deductibles and co-payments', 'Health outcomes', 'Distributional impact', 'Federal deficit', 'CBO estimates of revenue and impact on deficit', 'Opinions on CBO projections', 'Employer mandate and part-time work', 'Hospitals', 'Economic consequences', 'Public opinion', 'Political aspects', '"Obamacare"', 'Common misconceptions', '"Death panels"', 'Members of Congress', 'Illegal immigrants', 'Exchange "death spiral"', '"If you like your plan"', 'Criticism and opposition', 'Legal challenges', '<i>National Federation of Independent Business v. Sebelius</i>', 'Contraception mandate', '<i>King v Burwell</i>', '<i>House v. Price</i>', '<i>United States House of Representatives v. Azar</i>', '<i>California v. Texas</i>', 'Risk corridors', 'Non-cooperation', 'Repeal efforts', '2013 federal government shutdown', '2017 repeal effort', 'Actions to hinder implementation', 'Socialism debate', 'Implementation', 'In popular culture', 'See also', 'References', 'Further reading', 'Preliminary CBO documents', 'CMS Estimates of the impact of P.L. 111-148', 'CMS Estimates of the impact of H.R. 3590', 'Senate Finance Committee meetings', 'External links', 'ACA text']
-------------------------
Medicaid coverage gap
https://en.wikipedia.org/wiki/Medicaid_coverage_gap
['Population characteristics', 'Medicaid expansion', 'Affordable Care Act provision', '<i>National Federation of Independent Business v. Sebelius</i> (2012)', 'States adopting Medicaid expansion after ACA enactment', 'Maine', 'Oklahoma', 'South Dakota', 'Utah', 'See also', 'Notes', 'References']
topic =  Medicine Price in Medicaid
TITLES= ['Affordable Care Act', 'Medicaid', '340B Drug Pricing Program']
-------------------------
Affordable Care Act
https://en.wikipedia.org/wiki/Affordable_Care_Act
['Provisions', 'Insurance regulations: individual policies', 'Individual mandate', 'Exchanges', 'Premium subsidies', 'Cost-sharing reduction subsidies', 'Risk management', 'Risk corridors', 'Reinsurance', 'Risk adjustment', 'Medicaid expansion', 'Medicare savings', 'Taxes', 'Medicare taxes', 'Excise taxes', 'SCHIP', 'Dependents', 'Employer mandate', 'Delivery system reforms', 'Hospital quality', 'Bundled payments', 'Accountable care organizations', 'Medicare drug benefit (Part D)', 'State waivers', 'Other insurance provisions', 'Nutrition labeling requirements', 'Legislative history', 'Individual mandate', 'Academic foundation', 'Healthcare debate, 2008–10', 'Senate', 'House', 'Post-enactment', 'Impact', 'Coverage', 'Taxes', 'Insurance exchanges', 'Medicaid expansion in practice', 'Medicaid expansion by state', 'Insurance costs', 'Deductibles and co-payments', 'Health outcomes', 'Distributional impact', 'Federal deficit', 'CBO estimates of revenue and impact on deficit', 'Opinions on CBO projections', 'Employer mandate and part-time work', 'Hospitals', 'Economic consequences', 'Public opinion', 'Political aspects', '"Obamacare"', 'Common misconceptions', '"Death panels"', 'Members of Congress', 'Illegal immigrants', 'Exchange "death spiral"', '"If you like your plan"', 'Criticism and opposition', 'Legal challenges', '<i>National Federation of Independent Business v. Sebelius</i>', 'Contraception mandate', '<i>King v Burwell</i>', '<i>House v. Price</i>', '<i>United States House of Representatives v. Azar</i>', '<i>California v. Texas</i>', 'Risk corridors', 'Non-cooperation', 'Repeal efforts', '2013 federal government shutdown', '2017 repeal effort', 'Actions to hinder implementation', 'Socialism debate', 'Implementation', 'In popular culture', 'See also', 'References', 'Further reading', 'Preliminary CBO documents', 'CMS Estimates of the impact of P.L. 111-148', 'CMS Estimates of the impact of H.R. 3590', 'Senate Finance Committee meetings', 'External links', 'ACA text']
-------------------------
Medicaid
https://en.wikipedia.org/wiki/Medicaid
['Features', 'History', 'Expansion under the Affordable Care Act', 'State implementations', 'Differences by state', 'Political influences', 'Eligibility and coverage', 'Reimbursement for care providers', 'Enrollment', 'Comparisons with Medicare', 'Benefits', 'Dental', 'Eligibility', 'PPACA income test standardization', 'Non-PPACA eligibility', 'Supplemental Security Income beneficiaries', 'Five year "look-back"', 'Immigration status', 'Children and SCHIP', 'HIV', 'Utilization', 'Budget and financing', 'Effects', 'Coverage gains', 'Mortality and disability reduction', 'Rural hospitals boosted revenue', 'Financial and health security increase', 'Political participation increase', 'Crime reduction', 'Oregon Medicaid health experiment and controversy', 'See also', 'References', 'Further reading', 'External links']
-------------------------
340B Drug Pricing Program
https://en.wikipedia.org/wiki/340B_Drug_Pricing_Program
['Program description and history', 'Administration', 'Eligibility', 'The Disproportionate Share Hospital (DSH) Adjustment Percentage', 'Expansion', 'Covered Entity Eligibility', 'Patient definition', 'Contract pharmacies', 'Government reports', 'Contract Pharmacy Arrangements', 'Manufacturer Discounts Offer Benefits, but Federal Oversight Needs Improvement', 'State Medicaid Policies and Oversight Activities', 'Pharmaceutical Manufacturers Overcharged 340B-Covered Entities', 'Other reports', 'Analysis of 340B DSH Hospital Services Delivered to Vulnerable Patient Populations', 'Outpatient Prescription Dispensing Patterns Through Contract Pharmacies In 2012', "Federal Drug Discount Program Critical for Oregon's Health", 'Unfulfilled Expectations: An analysis of charity care provided by 340B hospitals', 'RAND Corporation 2011 Study', '<i>Charlotte News Observer</i> April 2012 Series: "Prognosis: Profits" and Subsequent Congressional Investigation', 'Safety Net Hospitals for Pharmaceutical Access (SNHPA) Report', 'Legislative oversight', '340B and Medicaid', 'References']
topic =  Medicaid USA in Medicaid
TITLES= ['Medicaid', 'Affordable Care Act', 'Medicaid Estate Recovery Program']
-------------------------
Medicaid
https://en.wikipedia.org/wiki/Medicaid
['Features', 'History', 'Expansion under the Affordable Care Act', 'State implementations', 'Differences by state', 'Political influences', 'Eligibility and coverage', 'Reimbursement for care providers', 'Enrollment', 'Comparisons with Medicare', 'Benefits', 'Dental', 'Eligibility', 'PPACA income test standardization', 'Non-PPACA eligibility', 'Supplemental Security Income beneficiaries', 'Five year "look-back"', 'Immigration status', 'Children and SCHIP', 'HIV', 'Utilization', 'Budget and financing', 'Effects', 'Coverage gains', 'Mortality and disability reduction', 'Rural hospitals boosted revenue', 'Financial and health security increase', 'Political participation increase', 'Crime reduction', 'Oregon Medicaid health experiment and controversy', 'See also', 'References', 'Further reading', 'External links']
-------------------------
Affordable Care Act
https://en.wikipedia.org/wiki/Affordable_Care_Act
['Provisions', 'Insurance regulations: individual policies', 'Individual mandate', 'Exchanges', 'Premium subsidies', 'Cost-sharing reduction subsidies', 'Risk management', 'Risk corridors', 'Reinsurance', 'Risk adjustment', 'Medicaid expansion', 'Medicare savings', 'Taxes', 'Medicare taxes', 'Excise taxes', 'SCHIP', 'Dependents', 'Employer mandate', 'Delivery system reforms', 'Hospital quality', 'Bundled payments', 'Accountable care organizations', 'Medicare drug benefit (Part D)', 'State waivers', 'Other insurance provisions', 'Nutrition labeling requirements', 'Legislative history', 'Individual mandate', 'Academic foundation', 'Healthcare debate, 2008–10', 'Senate', 'House', 'Post-enactment', 'Impact', 'Coverage', 'Taxes', 'Insurance exchanges', 'Medicaid expansion in practice', 'Medicaid expansion by state', 'Insurance costs', 'Deductibles and co-payments', 'Health outcomes', 'Distributional impact', 'Federal deficit', 'CBO estimates of revenue and impact on deficit', 'Opinions on CBO projections', 'Employer mandate and part-time work', 'Hospitals', 'Economic consequences', 'Public opinion', 'Political aspects', '"Obamacare"', 'Common misconceptions', '"Death panels"', 'Members of Congress', 'Illegal immigrants', 'Exchange "death spiral"', '"If you like your plan"', 'Criticism and opposition', 'Legal challenges', '<i>National Federation of Independent Business v. Sebelius</i>', 'Contraception mandate', '<i>King v Burwell</i>', '<i>House v. Price</i>', '<i>United States House of Representatives v. Azar</i>', '<i>California v. Texas</i>', 'Risk corridors', 'Non-cooperation', 'Repeal efforts', '2013 federal government shutdown', '2017 repeal effort', 'Actions to hinder implementation', 'Socialism debate', 'Implementation', 'In popular culture', 'See also', 'References', 'Further reading', 'Preliminary CBO documents', 'CMS Estimates of the impact of P.L. 111-148', 'CMS Estimates of the impact of H.R. 3590', 'Senate Finance Committee meetings', 'External links', 'ACA text']
-------------------------
Medicaid Estate Recovery Program
https://en.wikipedia.org/wiki/Medicaid_Estate_Recovery_Program
['Details', 'History', 'Non-LTCR estate recovery and ACA', 'View that non-LTCR recovery is problematic', 'Argument for non-LTCR estate recovery since ACA', 'Post-ACA adjustments to recovery regulations', 'State regulation adjustments stopping non-LTCR Estate Recovery', 'States maintaining non-LTCR estate recovery', 'External links', 'References']

As illustrated above, each word cloud showcases a unique vocabulary, suggesting the potential for clustering the data based on these linguistic variations. To effectively use text data for clustering, it must initially undergo vectorization. Following this, the data is normalized and subsequently clustered utilizing three distinct methods: K-means, DBSCAN, and Hierarchical clustering.

K-Means Clustering

When initially considering clustering for any dataset, the most common approach is K-Means clustering. This method clusters data into ‘k’ distinct groups using Lloyd’s algorithm, which identifies centroids or central points of the dataset. It then groups the data around these centroids. The ‘k’ in K-Means represents the number of these centroids or centers around which the data is organized.

The Elbow Method: The elbow method offers a straightforward way to assess the effectiveness of a chosen ‘k’ value for clustering. Utilizing metrics such as inertia or distortion, which both calculate the sum of squared Euclidean distances between each point in a cluster and its centroid, this method visually represents the efficiency of different ‘k’ values. In the graph produced by the elbow method, the point where the curve bends most significantly indicates the optimal ‘k’ value. For the record data in question, the elbow method revealed a pronounced bend at k=2, suggesting it as the ideal number of clusters.

K-Means Clustering Visualizations:

To illustrate the results of K-means clustering, we begin by selecting the value of ‘k’. Based on the elbow plot, I have chosen ‘k’ values of 2 and 3 for this analysis. Below, you can see the visual representation of the clustering outcomes for these selected ‘k’ values:

/Users/naomiyamaguchi/Library/r-miniconda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
/var/folders/cf/c1prx4vx7lz0h2p7zgncpmy00000gn/T/ipykernel_1880/1400740669.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['Cluster'] = kmeans.labels_

/Users/naomiyamaguchi/Library/r-miniconda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
/var/folders/cf/c1prx4vx7lz0h2p7zgncpmy00000gn/T/ipykernel_1880/1106516911.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['Cluster'] = kmeans.labels_

Based on the analysis of visualizations and statistics, it appears that the optimal value of ‘k’ for our clustering analysis is 3. The plot distinctly displays three categories of uninsured rates, each represented by a different color, highlighting the impact of the Affordable Care Act (ACA) on these groupings.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Clustering

Density-based clustering excels in grouping data that is closely packed, offering an effective solution where traditional clustering methods fall short. A quintessential example of density clustering can be visualized as below, showcasing two distinct clusters that conventional techniques might struggle to separate. This method’s strength lies in its ability to identify and differentiate clusters that are densely grouped yet distinct from each other, a task that traditional clustering approaches often fail to accomplish effectively.

/var/folders/cf/c1prx4vx7lz0h2p7zgncpmy00000gn/T/ipykernel_1880/3970845994.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['Cluster'] = clusters

Plot looks like most data points have been classified into one large cluster (indicated by the colors towards the top of the color bar), with a few points potentially classified as noise or smaller, separate clusters.

The plot shows a general trend where a decrease in the uninsured rate from 2010 to 2015 is visible, as the majority of points are below the line y = x, which would indicate no change. The plot does not show clearly distinct clusters, suggesting that the data may not have distinct groupings based on uninsured rates or that the parameters chosen for DBSCAN did not result in well-defined clusters.

Hierarchical Clustering

Hierarchical clustering is another notable method that constructs cluster hierarchies in two ways: bottom-up (agglomerative) or top-down (divisive). This technique groups data points into clusters based on various distance calculation methods. Hierarchical clustering offers a valuable approach to determining the appropriate number of clusters. The selection of a distance metric influences the number of clusters formed, and users can evaluate their distinctiveness using a dendrogram - a tree-like diagram that visualizes the clustering process.

In our analysis, we will employ the bottom-up, or agglomerative, approach for building hierarchical clusters. This method starts by treating each data point as a separate cluster and progressively merges them based on their proximity, leading to a comprehensive hierarchy of clusters.

The visualization uses a color-coded legend to distinguish between two clusters: Cluster 0 is represented in blue and Cluster 1 in orange. The predominance of blue dots signifies that the agglomerative clustering algorithm has grouped the majority of data points into Cluster 0. In contrast, the orange dots, denoting Cluster 1, are sparse, suggesting they are outliers or unique cases rather than members of a closely connected cluster.

Examining the scatter of these points reveals that many 2015 data points are lower compared to their 2010 counterparts, hinting at an overall decrease in uninsured rates. The clustering effectively separates the bulk of the data into Cluster 0, characterized by similar traits, while isolating a handful of data points into Cluster 1. This distinct cluster may be identified due to their comparably higher uninsured rates in either or both years, underscoring a divergent trend from the main group.

The dendrogram provides a flexible approach to determining the number of clusters by where you place a horizontal ‘cut’. For instance, slicing at the highest vertical line, corresponding to a Euclidean distance of about 20, results in two broad clusters. Conversely, cutting at a lower level, such as a Euclidean distance of 5, yields a greater number of clusters.

Regarding our dataset on uninsured rates, a higher-level clustering could differentiate regions with significant disparities in their changes in uninsured rates. On the other hand, making finer cuts allows for the identification of more subtle variations within the data, revealing intricate patterns that might otherwise be overlooked.

Conclusion

Taking into account the various clustering techniques and their respective outcomes, it appears that hierarchical clustering yields the most clearly defined clusters, making it a strong candidate for predictive analysis. Despite the clusters not aligning perfectly with the initial labels, this method shows promise for document classification. For instance, a document detailing specific ACA facts and their impact on medical access in the U.S. could be accurately categorized into one of these clusters with a reasonably high probability, facilitating effective classification.